Learning for Sequence Extraction Tasks

نویسندگان

  • Massih-Reza Amini
  • Hugo Zaragoza
  • Patrick Gallinari
چکیده

We consider the application of machine learning techniques for sequence modeling to Information Retrieval (IR) and surface Information Extraction (IE) tasks. We introduce a generic sequence model and show how it can be used for dealing with different closed-query tasks. Taking into account the sequential nature of texts allows for a finer analysis than what is usually done in IR with static text representations. The task we are focusing on is the retrieval and labeling of texts passages, also known as highlighting and surface information extraction. We describe different implementations of our model based on Hidden Markov Models and Neural Networks. Experiments are performed using the MUC6 corpus from the information extraction community.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analysis of Active Learning Strategies for Sequence Labeling Tasks

Active learning is well-suited to many problems in natural language processing, where unlabeled data may be abundant but annotation is slow and expensive. This paper aims to shed light on the best active learning approaches for sequence labeling tasks such as information extraction and document segmentation. We survey previously used query selection strategies for sequence models, and propose s...

متن کامل

Tree Sequence Kernel for Natural Language

We propose Tree Sequence Kernel (TSK), which implicitly exhausts the structure features of a sequence of subtrees embedded in the phrasal parse tree. By incorporating the capability of sequence kernel, TSK enriches tree kernel with tree sequence features so that it may provide additional useful patterns for machine learning applications. Two approaches of penalizing the substructures are propos...

متن کامل

Novel metrics for feature extraction stability in protein sequence classication

Feature extraction is an unavoidable task, especially in the critical step of preprocessing biological sequences. This step consists for example in transforming the biological sequences into vectors of motifs where each motif is a subsequence that can be seen as a property (or attribute) characterizing the sequence. Hence, we obtain an objectproperty table where objects are sequences and proper...

متن کامل

Exploring Relational Features and Learning under Distant Supervision for Information Extraction Tasks

Information Extraction (IE) has become an indispensable tool in our quest to handle the data deluge of the information age. IE can broadly be classified into Named-entity Recognition (NER) and Relation Extraction (RE). In this thesis, we view the task of IE as finding patterns in unstructured data, which can either take the form of features and/or be specified by constraints. In NER, we study t...

متن کامل

زمان‌بندی گروهی با در نظر گرفتن اثر یادگیری در سیستم تولید سلولی

The group scheduling problem in the cellular manufacturing system is comprised of two levels of scheduling. At the first level, the sequence of parts in each part-family is determined, and then at the second level the sequence of part-families is determined. In this paper, the flow shop group scheduling is investigated in order to minimize the makespan. In traditional group scheduling problems,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000